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LONG-TERM  VERIFICATION  TRENDS  OF  FORECASTS  BY  THE 
NATIONAL  WEATHER  SERVICE 

Duane  S.  Cooley  and  Robert  G.  Derouin 

ABSTRACT.   Averages  of  daily  forecast  verification 
records  of  the  National  Weather  Service  (NWS)  (pre- 
viously the  United  States  Weather  Bureau)  over  a 
period  of  many  years  are  summarized  and  illustrated 
below.   Results  of  studying  precipitation,  tempera- 
ture, wind,  and  pressure  predictions  made  during 
the  past  two  or  three  decades  show  that  progress, 
though  sometimes  slow,  is  definitely  being  made  in 
improving  weather  forecasts. 

1.   INTRODUCTION 

This  paper  is  an  update  and  extension  of  earlier  reports  on  the  accuracy  of 
selected  National  Weather  Service  forecasts  (Roberts  et  al.  1967a,  and 
Cressman  1970).   The  graphs  in  this  report  were  produced  through  the  joint 
efforts  of  the  National  Weather  Service  Headquarters,  the  National  Meteoro- 
logical Center  (NMC) ,  the  National  Hurricane  Center  (NHC) ,  the  National 
Severe  Storms  Forecast  Center  (NSSFC),  and  selected  forecast  offices.   Sec- 
tion 2  illustrates  progress  in  central  guidance  forecasts  received  by  the 
field  office  forecasters.   Section  3  summarizes  records  from  a  few  major 
cities  and  the  nation  as  a  whole.   Section  4  shows  the  records  relating  to 
tornadoes  and  hurricanes,  and  section  5  illustrates  the  progress  in  extended 
forecasting.   Since  there  is  no  completely  acceptable  way  of  defining  fore- 
cast accuracy,  a  number  of  different  verification  indexes  are  used. 

2.   GUIDANCE  FORECASTS 

Guidance  forecasts  produced  by  NMC  are  scored  in  a  number  of  different  ways, 
depending  on  the  forecast  parameter  and  the  manner  in  which  it  is  presented. 
Nearly  all  these  forecasts  have  shown  improvement  during  the  past  decade  or 
so. 

Maps  that  present  the  forecast  shape  of  the  sea  level  pressure  pattern  or  the 
mid-tropospheric  pattern  are  used  by  the  field  forecast  offices  to  guide  fore- 
casters in  making  local  and  statewide  weather  forecasts.   NMC  scores  these 
maps  using  the  S]^  score  (Teweles  and  Wobus  1954)  and  progress  in  forecasting 
the  maps  is  shown  in  figures  1  and  2.   An  arbitrary  skill  score  in  percent 
is  also  shown  in  the  figures* 

The  progress  in  12-hour  forecasts  by  NMC's  Analysis  and  Forecast  Division 
(A&FD)  is  shown  in  figures  3  and  4.   The  verifications  are  based  on  the 
threat  score  which  is  the  ratio  of  the  correct  forecasts  to  the  number  of 
forecasts  and  occurrences,  less  the  correct  forecasts.   Figure  3  shows  the 
results  from  60  points  for  measurable  precipitation  while  figure  4  shows 
the  results  from  areas  with  4  or  more  inches  of  snow. 
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Figure   1.--NMC  average  30-hour   surface   SL   scores 
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Figure   2.--NMC  average  36-hour  500-mb   S.    scores 
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Figure  4.--A&FD  verification  of  heavy  snow  forecasts  for  a  12-hour  period 


3.   CITY  AND  NATIONAL  FORECASTS 

Long-term  verification  records  of  various  parameters  are  available  for  a 
few  stations,  and  for  a  sum  of  stations  over  the  conterminous  United 
States.   Nearly  all  these  records  show  improvement  in  skill  over  the 
years.   Figure  5  shows  a  decline  in  large  temperature  errors  at  Salt  Lake 
City  since  1949.   Figure  6  shows  a  gradual  improvement  in  precipitation 
and  temperature  forecasts  for  Chicago  since  1942,  reaching  near  90  percent 
correct  in  the  late  1960's.   Figures  7  and  8  show  the  improvement  in  pre- 
cipitation forecasting  for  Washington,  D.C.  since  1945,  and  nationally 
(Alaska  and  Hawaii  excluded)  for  150  to  250  stations  since  1959,  respec- 
tively.  (Although  the  0-12  hour  score  is  not  shown  nationally,  it  is 
currently  about  87  percent,  which  is  5  percent  better  than  1959.)   The 
latter  three  graphs  are  based  on  the  ratio  of  correct  forecasts  to  the 
total  forecasts.   A  probability  of  50  percent  or  more  is  considered  a 
precipitation  forecast.   Correct  precipitation  forecasts  involve  the 
occurrence  (non-occurrence)  of  measurable  precipitation  except  for  Chicago, 
where  a  trace  was  verified  either  way.   In  addition,  a  temperature  forecast 
for  Chicago  was  considered  to  be  correct  if  it  was  within  -  10  degrees  (or 
less,  depending  on  the  season)  of  the  observed.   Since  1966  the  data  for  the 
150  to  250  stations  are  from  the  official  verification  program  (Roberts 
et  al.  1967b);  the  yearly  value  represents  the  period  April  through  March. 


4.   TORNADO  AND  HURRICANE  STATISTICS 

The  most  destructive  storms,  in  loss  of  life  and  damage  to  property,  are 
tornadoes  and  hurricanes.   Property  damage  and  deaths  from  these  storms, 
by  5-year  periods,  are  shown  in  figures  9  and  10,  respectively.   Both 
figures  show  a  steady  increase  in  the  economic  loss,  and  a  decrease  in 
the  number  of  deaths  from  the  early  1900" s  until  the  present.   Figure  11 
shows  the  number  of  tornadoes  and  tornado  deaths  normalized  to  the  fre- 
quency per  million  population.   As  the  population  expanded,  more  tornadoes 
were  reported,  but  there  were  fewer  deaths  per  million.   The  decrease  in 
the  death  rate  from  tornadoes  and  hurricanes  is  a  reflection  of  the  fore- 
casts and  warnings  issued  by  the  NSSFC  in  Kansas  City,  the  NHC  in  Miami, 
and  the  field  offices.   In  addition,  both  the  reporting  network  and  com- 
munication system  have  improved  over  the  years. 

The  accuracy  of  the  official  24-hour  hurricane  forecasts  is  shown  in 
figure  12.   Since  the  mid  1950's,  the  mean  errors  of  the  hurricane  tracks 
have  decreased  by  nearly  33  percent.   It  should  be  pointed  out,  however, 
that  the  verification  system  was  changed,  since  the  forecasts  from  1956- 
67  were  verified  28  hours  after  data  time  while  those  from  1968-71  were 
verified  24  hours  after  data  time. 


5.   EXTENDED  FORECASTS 

The  long-term  performance  accuracy  of  5-day  precipitation  and  temperature 
forecasts  is  measured  statistically  by  a  "skill  score."  This  score,  fre- 
quently used  in  forecast  verification,  represents  the  mean  performance 
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Figure  5. --Annual  number  of  maximum  temperature  errors  equal  to  or  greater 
than  10°F  for  Salt  Lake  City 
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Figure  6„ --Percent  of  correct  precipitation  and  temperature  forecasts  for 
Chicago,  based  on  2  forecasts  per  day  over  2  periods 
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Figure   7, 


•Percent   of   correct   precipitation   forecasts    for  Washington, 
D.C.,  based   on   4   forecasts   per   day   over  3   periods 
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Figure  8. --Percent  of  correct  local  precipitation  and  no  precipitation 
forecasts  nationally  (average  of  150  to  250  stations) 
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Figure  9„ --Tornado  damage  in  millions  of  dollars  (adjusted  to 
1968)  and  tornado  deaths;  5 -year  averages 
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Figure  10. — Hurricane  damage  in  millions  of  dollars  (adjusted  to 
1957-59)  and  hurricane  deaths;  5-year  averages 
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Figure  11. --Mean  annual  tornadoes  and  tornado  deaths 
per  million  population 
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*  The  forecasts  from  1956-67  were  verified  28  hours  after  data  time. 
The  forecasts  from  1968-71  were  verified  24  hours  after  data  time. 

Figure  12. --All  official  24-hour  National  Weather  Service  tropical 
storm/hurricane  forecasts;  4-year  averages 


over  the  5-day  period  measured  by  the  ratio  of  correct  to  total  forecasts, 
less  the  expected  correct  (based  on  past  climatology).   These  5-day  mean 
forecasts  were  for  ranges  of  temperature  designated  as  much  below  normal, 
below  normal,  normal,  above  normal,  and  much  above  normal.   Precipitation 
forecasts  were  for  5-day  totals  of  light,  moderate  and  heavy,  or  none, 
moderate,  and  heavy,  except  for  the  driest  regions  where  the  forecast 
called  for  precipitation  or  no  precipitation. 

Figure  13  shows  the  skill  scores  for  5-day  precipitation  and  temperature 
forecasts  since  1947  and  indicates  that  temperature  is  easier  to  forecast,, 
The  figure  also  illustrates  the  improvement  in  forecast  skill  with  techno- 
logical advancements.   These  advancements,  in  the  form  of  numerical  models 
of  the  atmosphere  and  supplementary  statistical  equations  with  forecast 
solutions  obtained  by  computer,  were  responsible  after  the  mid-1950' s  for 
improved  and  more  timely  meteorological  guidance  to  the  extended-range 
forecaster.   Correspondingly,  forecasting  skill  rose  during  this  period. 
After  1966,  a  new  and  more  sophisticated  model  and  larger  and  improved 
computers  came  into  being,  leading  to  still  better  guidance.   Further 
improvement  in  extended  range  forecasting  is  shown  by  still  higher  skill 
scores. 

The  above  discussions  of  skill  are  based  on  verification  scores  of  fore- 
casts made  through  1969  when  5-day  mean  forecasts  were  issued  3  days  a 
week.   Beginning  February  9,  1970,  the  extended  forecast  product  is  new, 
more  detailed,  and  different.   Daily  flow  patterns,  precipitation,  and 
maximum/ mini mum  temperature  forecasts  out  to  5  days  are  now  issued  7  days 
a  week.   Because  of  the  different  format,  new  temperature  and  precipita- 
tion verification  scores  are  not  strictly  comparable  to  the  old,  but 
qualitatively  the  forecast  skill  is  about  the  same. 

A  comparison  of  daily  sea-level  pressure  charts  is  shown  in  figure  14. 
This  is  the  only  direct  historical  comparison  that  can  be  made  between 
the  new  and  old  extended  forecast  packages.   Skill  is  measured  by  corre- 
lating the  standardized  differences  between  the  forecast  and  observed 
maps,  less  the  normal  map.   The  new  program  shows  slight  improvement  for 
days  3  and  4,  but  not  day  5.   However,  persistence  shows  less  skill  for 
the  second  2-year  period  than  for  the  first.   The  forecasts  for  the 
second  period  show  greater  skill  over  persistence.   At  least  it  would 
seem  safe  to  say  that  the  forecasts  for  the  two  periods  have  about  equal 
skill,  while  the  productivity  increased  by  a  factor  of  7/3. 


6.   SUMMARY  AND  CONCLUSIONS 

The  results  of  studying  precipitation,  temperature,  wind,  and  pressure 
forecasts  made  during  the  past  two  or  three  decades  show  that  progress, 
though  sometimes  slow,  is  definitely  being  made  in  improving  weather 
predictions.   Much  of  this  progress  resulted  from  the  use  of  numerical 
models  and  high  speed  computers.   In  addition,  most  of  the  improvement 
has  been  made  for  forecasts  beyond  12  hours.   As  yet,  there  has  been 
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Figure    13. --Five-day  extended   forecasts    to   1969 
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Figure    14. --Extended   forecast   surface   pressure   prog   verifications 
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little  improvement  in  short-range  forecasts  on  the  mesoscale  (aviation 
terminals  and  severe  storms) ,  and  it  is  in  these  areas  where  more 
emphasis  is  needed. 
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